Using tree-grammars for training set expansion in page classification

نویسندگان

  • Stefano Baldi
  • Simone Marinai
  • Giovanni Soda
چکیده

In this paper we describe a method for the expansion of training sets made by XY trees representing page layout. This approach is appropriate when dealing with page classification based on MXY tree page representations. The basic idea is the use of tree grammars to model the variations in the tree which are caused by segmentation algorithms. A set of general grammatical rules are defined and used to expand the training set. Pages are classified with a k − nn approach where the distance between pages is computed by means of tree-edit distance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prediction of melting points of a diverse chemical set using fuzzy regression tree

The classification and regression trees (CART) possess the advantage of being able to handlelarge data sets and yield readily interpretable models. In spite to these advantages, they are alsorecognized as highly unstable classifiers with respect to minor perturbations in the training data.In the other words methods present high variance. Fuzzy logic brings in an improvement in theseaspects due ...

متن کامل

Detection of some Tree Species from Terrestrial Laser Scanner Point Cloud Data Using Support-vector Machine and Nearest Neighborhood Algorithms

acquisition field reference data using conventional methods due to limited and time-consuming data from a single tree in recent years, to generate reference data for forest studies using terrestrial laser scanner data, aerial laser scanner data, radar and Optics has become commonplace, and complete, accurate 3D data from a single tree or reference trees can be recorded. The detection and identi...

متن کامل

Steel Buildings Damage Classification by damage spectrum and Decision Tree Algorithm

Results of damage prediction in buildings can be used as a useful tool for managing and decreasing seismic risk of earthquakes. In this study, damage spectrum and C4.5 decision tree algorithm were utilized for damage prediction in steel buildings during earthquakes. In order to prepare the damage spectrum, steel buildings were modeled as a single-degree-of-freedom (SDOF) system and time-history...

متن کامل

Alternating Regular Tree Grammars in the Framework of Lattice-Valued Logic

In this paper, two different ways of introducing alternation for lattice-valued (referred to as {L}valued)  regular tree grammars and {L}valued top-down tree automata are compared. One is the way which defines the alternating regular tree grammar, i.e., alternation is governed by the non-terminals of the grammar and the other is the way which combines state with alternation. The first way is ta...

متن کامل

PreRkTAG: Prediction of RNA Knotted Structures Using Tree Adjoining Grammars

Background: RNA molecules play many important regulatory, catalytic and structural <span style="font-variant: normal; font-style: norma...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003